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Author’s Abstract 


A time service in a distributed system may be used both for multiprocess 
synchronization and for simply finding out what time it is. For synchroniza- 
tion, the time provided by different servers should be closely synchronized. 
For telling time, the time provided by each server should be a close approx- 
imation to Universal Time (the international time standard). Algorithms 
are presented for implementing a fault-tolerant time service that meets both 
requirements. 


Capsule Review 


Users of electronic mail are not surprised to see messages that are time- 
stamped after they were received. The naive blame the probably non- 
existent operator, “who didn’t get the time right.” The wary know that 
many computer systems go for long periods without stopping; during those 
periods adjustments are difficult or impossible because almost all time- 
dependent processes (for example, file systems, mailers, audit trails, back-up 
mechanisms, etc.) assume that the time provided by the local operating sys- 
tem is increasing smoothly. Hence a good time service not only must be close 
to the real time, but also must increase and maintain only a bounded rate 
of change. 

For some applications (for example, distributed data bases), there is a 
third requirement: the times maintained by various processors within a net- 
work must be very close to each other. This requirement is so stringent that 
it is not enough simply to ensure that the time provided by each processor 
is within a certain limit of the real time. Resynchronizations with the real 
time must therefore be coordinated. 

The final requirement of a good time service is that the resynchronization 
protocol must allow a certain number of faulty links or faulty processors, 
since network protocols ought to work in the presence of partial failures. 

Previous authors have presented algorithms that satisfy some of these 
requirements, but the algorithm described here is the first that satisfies all 
four simultaneously. Another attribute of the paper is that both the problem 
and the solution are precisely formulated. 


Andrei Broder 
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1 Introduction 


A time service provides the “current time” to its users. It performs two 
functions: 


e Telling a user the current time and date. 


e Allowing different users to synchronize their activities. 


Though related, these two functions are distinct. The first requires that the 
time provided be approximately equal to Universal Time—the ideal stan- 
dard that is approximated by the National Bureau of Standards’ broadcasts 
on station WWV. The second requires that the time provided to different 
users be approximately the same. 

Marzullo [6] devised algorithms for providing an accurate time and date, 
and a number of fault-tolerant synchronization algorithms have been pro- 
posed [2,4,5], but there has apparently been no previous work that consid- 
ered both functions at the same time. In this paper, I consider the problem 
of implementing a fault-tolerant time service that provides a single time 
value to perform both functions—more precisely, a time value that permits 
close to optimum synchronization and is a reasonable approximation to the 
correct time and date. 

The problem to be solved is stated somewhat informally in this intro- 
duction. Section 2 states the assumptions and conditions more precisely and 
defines some helpful notation. (A glossary is provided at the end of the paper 
to help the reader follow the notation.) Section 3 describes Marzullo’s algo- 
rithm for computing the best possible approximation to the correct time and 
date, and Section 4 develops an algorithm for a time service that provides 
the two functions. 

The algorithms assume a network of processes, in which each node has a 
local clock that runs at approximately the correct rate, and some nodes also 
have direct access to Universal Time, perhaps obtained by “listening” to 
WWYV. A time server is implemented by synchronizing all the nodes’ clocks, 
using the available information about Universal Time. 

The nodes providing the time service may be a subset of the nodes 
in the complete system—the other nodes interrogating the time servers to 
obtain time information. However, nodes that do not act as time servers 
or providers of Universal Time are ignored. The time service must function 
properly despite the failure of some network components. 

A time service could provide two distinct values to satisfy its two different 
functions. For the function of providing the current time and date, a process 


p would provide a time interval J, that is its best approximation to Universal 
Time. More precisely, J, would be the smallest interval that p knows to 
contain UT, the current Universal Time. (It is most common to write a 
value with error bound in the form ¢t + «. However, it is more convenient to 
work with the interval [¢ — ¢€,¢ + €] within which the correct value is known 
to lie.) 

For the second time-service function, p must provide a time T,, called 
service time, that is close to the service time T, provided by any other node 
q- More precisely, it should provide T, and, for each other node q, a number 
bpq Such that |T, — T,| < 5p. However, this condition is not sufficient, since 
it is trivially met by letting each T, always be zero. To represent time, Ty 
should change at approximately the same rate as UT, so the value of T, 
increases by about one second with the passage of one second of Universal 
Time. 

While not strictly necessary, it is convenient to have JT, not only change 
at approximately the same rate as UT, but be approximately equal to it. 
Consider, for example, the problem of generating creation times for files. 
One might want to use the creation time to decide which version of a file 
is the current version. Since versions may be created at different nodes, a 
file generated at node p should use T, as its creation time to minimize the 
likelihood that a version created at one node receives an earlier creation time 
than a version created before it at a different node. However, one might also 
want a creation time to tell the Universal Time at which the file was created, 
so the user can determine the actual date and time of creation. This could 
be accomplished by recording a separate “universal creation time” derived 
from I,. However, this additional value is not needed if T, provides an 
acceptable approximation to UT. 

We therefore state the following three requirements for the time 7, pro- 
vided by node p, where Kp, €p, and the 6,, are values provided by p. (They 
could be constants that are announced when the system is “turned on”, or 
they could be provided in response to user requests.) 


correct rate The rate of change of T, with respect to UT lies between 
1—kK, and 1+ kp. 


synchronization For every other node q: |T, — Tp| < pq. 
correct time |UT — T,| < &. 


The synchronization requirement follows from the correct-time require- 
ment by letting 6p, = €) + €,. However, this may not provide close enough 


synchronization. Universal Time is of interest mainly to humans; synchro- 
nization algorithms depend only upon the differences between the values 
of T, at the different nodes. Humans seldom need to know the value of 
UT to better than a few seconds, so an €, of several seconds is acceptable. 
On the other hand, synchronization algorithms may have more stringent 
requirements for 6)g. When nodes p and q are using the time service for 
synchronization, p generally incurs a delay of O(6,,) seconds because of the 
lack of synchrony of T, and T,. For example, if g announces that it will 
release a resource at time ¢ (that is, when JT, equals ¢), then p must wait 
until TJ, reaches t + 6,4 before acquiring the resource. Some applications 
might require that this delay be kept to within a few microseconds. 

While it is not necessary to keep the €p as small as the 6p4, the required 
synchronization condition can be achieved if €, could be kept small enough. 
However, this is not always possible. The closeness with which clocks at 
different nodes can be directly synchronized depends upon the uncertainty 
in message transmission time between those nodes. Modern large networks 
are heterogeneous, and the uncertainty in transmission times may be very 
different for different pairs of nodes. Typically, a large network consists 
of a collection of local area networks (LANs) that are interconnected by 
point-to-point links. Nodes on a single LAN may be directly connected by 
a fiber-optic link, in which the uncertainty in transmission time can be as 
small as a few microseconds if the timing functions are performed at a low 
enough system level. The nodes in different LANs may communicate with 
one another by a store-and-forward protocol that could have an uncertainty 
of a second or more in transmission time.’ If a particular LAN does not 
include a direct source of Universal Time (such as a WWV receiver), so 
nodes in the LAN must base their knowledge of UT on messages received 
from outside the LAN, then the values of €, and €, for p and q in the LAN 
could be several orders of magnitude greater than the best achievable value 
of bpq. 

Marzullo [6] presented algorithms for obtaining a clock value from a set 
of clocks, some of which may be faulty. These algorithms can be used to 
provide the intervals J, that best approximate Universal Time, but they do 
not satisfy the synchronization condition if 5pg < €p + €,. Several Byzantine 


*Such a large value results not from uncertainty in the physical transmission times, but 
because the communication involves higher-level protocols, separated from the physical 
messages by many layers of software. It is quite likely that the timing of transmission 
delays can be done at a lower system level for intra-LAN messages than for inter-LAN 
messages. 


clock synchronization algorithms have been presented that can be used to 
satisfy the correct-rate and synchronization conditions in the presence of 
failures [2,4,5]. However, to my knowledge, there have been no published 
algorithms to achieve all three of the conditions above. 

The major part of this report concerns algorithms for achieving the syn- 
chronization condition when 6,, may be much smaller than €, + €,. This is 
a nontrivial problem in the presence of failures, because even very simple 
kinds of failure act like malicious, “Byzantine” failures. For example, sup- 
pose a node is sending its clock value to all other nodes in the network. It 
does this by sending a message saying something like “my clock now reads 
11:47”. Suppose, through some hardware or software error, it pauses for five 
minutes in the middle of this broadcast. While it sends the same message to 
all nodes, it has essentially sent descriptions of two clocks that differ by five 
minutes. Thus, this simple error results in a “two-faced” clock that provides 
different clock values to different nodes. 

The correct-time requirement does not mention the interval J,, which 
represents p’s knowledge of UT. One might be tempted to replace this re- 
quirement by the condition that T, be in J,. However, such a condition 
would be inconsistent with the other two requirements for T,. It is inconsis- 
tent with the correct-rate requirement because new knowledge of the correct 
value of UT, such as the receipt of a message from a node with a WWV re- 
ceiver, could suddenly reduce the width of the interval J,. Keeping the value 
of Tp within the interval J, could require a sudden change to Ty, which is 
prohibited by the correct-rate requirement. It can also be shown that, with 
malicious failures, requiring T, to be within J, could require violating the 
synchronization requirement if bp, < €p + €q. 


2 Notation and Assumptions 


In the introduction, the term node was used to emphasize that each node in 
the network provides a time service for user processes running at that node. 
To be consistent with the terminology commonly used in discussing clock 
synchronization, the term process will be used instead of node. 


2.1 Intervals 


The term interval is used to denote a closed interval on the real line—that 
is, an interval of the form [z,y] for s < y. The width of the interval 2 is 
denoted by ||R||, so ||[z,y]|| = y — z. The sum of two intervals is defined by 


[x,y] + [z, w] = [ce + z,y+w]. A real number z is considered to be the same 
as the interval [z, z], so [z, y] + z is defined to be the interval [x + z,y + z] 
obtained by translating the interval [z,y] to the right a distance of z. For 
any interval U and number 6 > 0, U + 6 is defined to equal U + [—6, 6], so 
[z,yJt 6 = [x — 6,y + 6]. 

A pseudo-metric d on intervals is a nonnegative, real-valued function 
on pairs of intervals satisfying the following properties for all intervals U, 
V, and W: (i) d(U,V) = d(V,U), (ii) d(U,V) + d(V,W) > d(U,W), and 
(iii) d(U, U + €) = € for any € > 0. A pseudo-metric satisfying the additional 
property that d(U,V) = 0 implies U = V is called a metric. (A metric is 
sometimes called a distance function.) 

Two important pseudo-metrics are 


e The midpoint pseudo-metric dy, where d,,(U,V) equals the distance 
between the midpoints of U and V. 


e The uniform metric d,,, where d,([z, y],[v, w]) equals the maximum of 
jv —a| and |w — yl. 


As its name implies, the uniform metric is a metric. Note that for any 
intervals U and V, d,,(U,V) < d,(U,V). 

A real-valued function F on m-tuples of intervals is said to satisfy the 
Lipschitz condition for a pseudo-metric d if, for any intervals U; and V; 
and any number 6 > 0: d(U;,V;) < 6 for all ¢ implies |F(U1,...,Um) - 
F(V,,...,;Vm)| < 6. The function F is said to be translation invariant 
if F(U, + 2,...,Um +2) = F(U,,...,Um) + for any intervals U; and 
real number z. Satisfying the Lipschitz condition is a stronger requirement 
than continuity and a weaker requirement than having a bounded deriva- 
tive. Translation invariance asserts that translating all arguments by a fixed 
amount causes the value to be translated by the same amount. We expect 
any functions appearing in a clock synchronization algorithm to be trans- 
lation invariant, since increasing all input clock values by a fixed amount 
should produce a corresponding increase in the clock values computed by 
the algorithm. Observe that if F satisfies the Lipschitz condition for the 
pseudo-metric d,,, then it also satisfies the condition for the metric d,. 


2.2 Clocks and Clock Ranges 


A time-dependent value is any real-valued function of a real variable. If v is 
a time-dependent value, we interpret v(t) to be the value of v at Universal 


Time t. A clock is a nondecreasing time-dependent value. If V is a clock, the 
value V(t) represents the value read by clock V at Universal Time t. The 
identity function, denoted by UT (so UT(t) = t), is a clock. The service 
time T, provided by process p is a clock, where 7,(¢) represents the time 
value provided by p to a request for the current service time received at 
Universal Time t. (Of course, p does not need to know the current value of 
Universal Time to compute the value of T, at that time.) 

Let Vi, ..., Va be clocks. (Think of V, as a clock maintained by process 
p.) The following definitions express the correctness conditions introduced 
informally in the introduction. They describe conditions on these clocks for 
an interval of time [u, v], where u and v represent clock values—that is, times 
indicated by the clocks themselves. Thus, the conditions express properties 
of the clocks over intervals [s,t] of Universal Time such that V,(s) = u and 
V,(t) = v for a process p. The conditions are expressed in this form because 
clock times are directly observable, Universal Times are not. The bounds 
Kp, 5pq, and €, are time-dependent values. (In many cases, they will be 
constants.) 


correct rate with bounds «,: For each p and any z and y, x # y such 
that V,(z) and V,(y) are in [u, 2]: 


y y 
[Ca ~ molt)) dt < Vola) — Vol2) < [1+ mplt)) at 


synchronization with bounds 6,,: For each p and q, p # q, and each t 
such that V,(t) is in [u,v]: |Vq(t) — Vp(t)| < 5p9(t). 


correct time with bounds ¢,: For each p and each t such that V,(t) is in 


[u,v]: |UT(t) — V,(t)| < e,(t). 


The correct-rate condition is defined in terms of integrals to avoid requiring 
that the V, be differentiable functions of time. When no interval is specified, 
these conditions are assumed to hold for all intervals. 

Each nonfaulty process p is assumed to have a clock Cy, called its local 
clock.? It is assumed that the local clocks of all nonfaulty processes satisfy 
the correct rate condition with bounds pp, < 1, where the pp are constants. 
(An error in the local clock C, is considered to be a failure of process p.) 

A p-clock is a clock of the form C, + v for some constant v—that is, a 
clock whose value at time ¢ is v+C,(t). A p-clock is one that runs at the same 


71t is sufficient for p to have a cyclic timer, since one can construct a monotonic clock 
from such a timer. In fact, the algorithms are easily modified to work with only a timer. 


rate as p’s local clock. Of course, C, is a p-clock. A p-clock V, is determined 
by its value at any single time to, since V,(t) = V,(to) — Cp(to) + C,(t). 
Note that T,, the time-service clock provided by p, will not in general be a 
p-clock. 

The following result is an easy consequence of the assumption that C, 
satisfies the correct-rate condition. It asserts that the uncertainty pp in the 
running rate of p’s local clock causes its knowledge of Universal Time to 
degrade at a rate of p, seconds per second of elapsed time on its local clock. 


Proposition 1 If process p is nonfaulty and R is an interval such that 
UT(to) € R, then for all t > to: 


UT(t) € R+ (1+ pp)(Cp(t) — Cp(to)) 
where terms of order p2(t — to) are neglected. 


A clock range is an interval-valued function on the reals of the form [z, y] 
where z and y are clocks. In other words, R is a clock range if there exist 
clocks x and y such that R(t) = [z(t),y(t)] for all times t. A p-clock range 
is a clock range of the form [z,y] such that z and y are pclocks. A p-clock 
range can be written as U + C, for some interval U. Since a real number 
z is identified with the interval [r,2], a clock is a special case of a clock 
range, and a p-clock is a special case of a p-clock range. A p-clock range is 
determined by its value at any single time. 

If F is a real-valued function on m-tuples of intervals, then applying F 
to an m-tuple of clock ranges produces a time-dependent value. Let R,,..., 
Rm be p-clock ranges with Rj; = U; + Cy for intervals U;. If F is translation 
invariant, then 


F(R1,..., Rm) = F(Ui,...,Um) + Cp 


Thus, if the R; are p-clock ranges, then F(Ri,..., Rm) is a p-clock. 


2.3. The Network 


I assume a network of processes connected by channels, where a channel 
may connect more than two processes. The two kinds of channels that are 
of interest are a point-to-point channel that has a single sender and a single 
receiver, and a broadcast channel that connects a set of processes so that 
any one of them can broadcast a message over it to all other processes on 


the channel. A two-way communication line is a broadcast channel that 
connects just two processes. 

Certain of the processes, called Universal Time providers, are assumed 
to have a direct source of Universal Time. Let UT”) denote the value of 
UT obtained by process 7. This value is a clock range that is known to 
contain the correct value of Universal Time, so UT(t) € UT)(t) for any 
time t. Process j will periodically broadcast UT) to other processes. 

For each channel c, I assume values 7°, and 74, such that a message 
sent over c at time t is received between times ¢ + 74; and t + 7§,, if the 
sender, the receiver, and c are all nonfaulty. More precisely, if an event, such 
as the receipt of another message, that occurs at the sender at time ¢ causes 
the sending of a message M over channel c, then M will be received between 
times t + 72;,, and t+7%,,. Thus, the minimum and maximum delays 74, 
and T,,,, include the time needed to generate and send the message as well 
as the time the message was actually in transit along channel c. The values 
of rg ,, and 76, may vary with time, but they are assumed to be known to 
the receiver. Let y° denote ro. — TS;,- 

Delivering a message with a delay less than 7{;, or greater than 74, 
constitutes a channel failure. If there is unpredictable variance in transmis- 
sion delay, due, for example, to variation in the channel loading, then TE, 
and 75, should be chosen conservatively to reduce the probability of such 
a failure. (Note that 7r°;, can always be taken to be zero.) However, the 
time required for fault-tolerant synchronization algorithms depends upon 
the values of r<,,, not on the actual delays, and the bounds 6,, on clock 
synchronization depend upon y°, so tradeoffs between reliability and effi- 
ciency must be made when choosing the values of 7¢,,, and T,,,. 

A path is a sequence of processes and channels, each channel connecting 
successive processes. The null path connects process p to itself. If is a 
path from p to q and 7 is a path from q to r, then 7 denotes the obvious 
path from p to r via q. 

For a path x from p to q, let 7, and 77,, denote the sum of all 75, 
and Tiax, respectively, for all channels c in . Thus, 7,7;,, and 7,1, represent 
the minimum and maximum transmission delays for a message relayed from 
p to q along 7. Define y” to be 7,7,, -— the uncertainty in transmission 
delay along 7. 

Fault-tolerant synchronization algorithms require that a process know 
the values 7,7;,, Trax, and 7” for messages it receives over the path 7, which 
usually requires knowledge of the values of r¢;,, and 74, for each channel 
cin the path. If these values can change, then new values can be broadcast 


7 
Tmin> 


using the method of [3], which assures that the same values are used by all 
processes. For simplicity, I assume that the 7,7,,, Trax, and 7” are constants 
for each fixed path 7. 

If R is an interval and 7 a path, then R” is defined to be the interval 
R+([r7 319 Trax] Suppose that 7 is a nonfaulty path from process p to process 
q, and R represents p’s knowledge of Universal Time at a certain time t— 
that is, p knows that UT(t) € R. If p sends a message with the value R 
along 7 to q and that message arrives at time ¢’, then since the transmission 
time of the message is in the interval [77;,, 74x], 9 knows that UT(t') € R”. 
Observe that if r¢ is a path, then R™? = (R7)?. 

Synchronization algorithms require processes to send clock ranges to 
one another. (Remember that a clock is a special case of a clock range.) 
A process p sends a p-clock range by sending a message with the clock 
range’s current value R along a path 7. The receiving process q interprets 
this message as the receipt of a qg-clock range whose value, at the time the 
message is received, is R™. The following result asserts that transmitting 
a clock range in this way causes an initial perturbation by a distance of 
up to y”, after which the two clock ranges drift apart at a rate of at most 
Pp + Pq, Where distance is measured by the uniform metric d, on intervals. 
This result is a simple consequence of the correct-rate assumption for the 
local clocks and the assumed bounds on message-transmission times. 


Proposition 2 Let R, be a p-clock range, and suppose that process p sends 
a message at time t that is received at time t' over a nonfaulty path q, and 
let R, be the q-clock range such that R,(t') = R(t)”. Then for any At > 0: 


d,(Rp(t + At), Ro(t + At)) < 77 + (pp + pg)At 


where terms of order pgTrax are neglected. 


3 Obtaining Universal Time 


Let us now consider how a time server p could provide the best possible 
value of J,, an interval known to contain UT. Assume that each Universal 
Time provider j maintains a clock range UT) that represents its current 
knowledge of Universal Time. If j is nonfaulty, then UT(t) € UT)(t) for all 
times t. At various times, provider j broadcasts the current value of U T(), 
Let U TY ) denote a clock range that represents p’s knowledge of the current 
value of UT). More precisely, assume that, if 7 and p are nonfaulty, then 


UT(t) lies within the interval U TY (t) for all times t. If p receives a message 
at time to informing it that UT (to) lies in the interval R, then Proposition 1 
implies that we can define UT!) by 


UTY)(t) = R+ (1 py)(Cp(t) — Cpl(to)) 


for t > to. In fact, this is how U TY (t) should be defined if p did not receive 
any information about UT) during the time interval [to,¢]. The problem 
of how j broadcasts the value UT“) is considered later. 

The algorithm for computing the best approximation to UT from a set 
of m intervals, each asserted to contain UT, was first obtained by Marzullo. 
It appears as Algorithm 4-2 in his thesis (6]. Define M/,(U1,...,Um) to be 
the largest interval whose endpoints belong to at least m — f of the intervals 
U;. It is not hard to show that, if one knows only that UT lies in all but 
at most f of the intervals U;, then M/,(U;,...,Um) is the smallest interval 
known to contain UT. The value of M/,(U,,...,Um) can be computed from 
the set of m intervals U; in O(mlogm) time by sorting their endpoints. It 
can be recomputed in O(m) time if just one of the U; changes. 

If the intersection of U, with Mf£(U1,...,Um) is empty, then 
ML(U,,.-.,Um) = Mi (Ua,.. .»>Um). In this case, U; is known to be 
one of the intervals that does not contain UT (there are at most f of them), 
so it may be thrown away when computing the best approximation to UT. 
After throwing it away, we are left with m — 1 intervals, all but at most 
f —1 of them containing UT. More generally, if k of the U; have an empty 
intersection with M/(U;,...,Um), then M£,(U1,...,Um) equals she value 
obtained by throwing away those k intervals and applying Mit _, to the 
remaining intervals. 

Assume that up to f of the m values U TY ) may be incorrect, where an 
interval U TY) is correct if UT always lies within it. The obvious way to 
choose J,, an interval that process p knows to contain UT, is to let it equal 
Mi(U tT). weed Ts”), However, suppose that at some time when C), has 
the value C, Mf( uT),. ens urs”) equals the interval VU. When C, has 
the value C+AC, UT must lie in the interval U+(1+p,)AC. However, the 
intervals U TY) are spreading out at a rate pp, which could cause the value 
of M to spread out at a faster rate—in fact, to make ee discontinuous 
jumps. Thus, when Cy, has the value C + AC, M( UT eu - uTs”) could 
be a larger interval than U + (1+ pp)AC. 

In Marzullo’s algorithm, p computes an initial value I,(to) from initial 
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values U T? (to) of the U Te? ) as follows. It throws away any of the U TY (to) 
that it decides are incorrect. If m — k intervals are currently believed to be 
correct, then the interval J,(to) equals Mit applied to the m — k correct 


intervals U TY (to). If no new values are received from the Universal Time 
providers between times to and t, then, ignoring terms of order p2(t — to), 
I,(t) is defined by 


Ip(t) = Ip(to) + (1 £ pp)(Cp(t) — Cp(to)) 


In other words, when p receives no new information, the interval J, advances 
(moves right along the number line) at p’s clock rate and widens at the rate 
of 2p, seconds per second of clock time. 

When a new value for U TY) arrives, process p adds the value to the 
set of intervals that it presumes to be correct, throwing away the previous 
value of U TY ) Process p next computes I, by applying M ae to the m—k 
intervals currently presumed correct. It then declares to be incorrect any 
of these intervals (the ones it had presumed to be correct) that have empty 
intersections with J,. 

When no new information is received from the Universal Time providers, 
the value of I, “deteriorates” at the rate of p, seconds per second. To 
maintain the accuracy of J,, it is necessary for the Universal Time providers 
to broadcast their values of UT“) sufficiently often. Suppose that a provider 
j sends its clock range U T®) to p at time ¢ by simply sending a message 
with the interval U T')(t) along some path x. If p receives this message at 
time t’, it sets U TY ) to UT) (t)*. Suppose that each provider j sends 
UT") in this way at least once every J seconds over a path x with 7” < 7. 
It is then easy to show that if || UT%|| < ¢ for every nonfaulty provider j 
and at least m — f of the time providers and their paths x are nonfaulty, 
then Marzullo’s algorithm guarantees that for all times ¢, I,(t) is contained 
within the interval UT(t) + (€+ 7 + ppJ). 


4 Providing the Service Time 


4.1 Ideal Time 


A time server p periodically receives information allowing it to refine the 
clock T, that represents the service time it provides to its clients. Informa- 
tion comes in discrete lumps—usually through the receipt of a message. To 
maintain the continuity of T,—more precisely, to maintain the boundedness 
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of its rate of change—the value of T, cannot change instantly in response 

to new information. Instead, the rate of change of T, is modified discontin- 

uously. This section presents a method for computing T,’s rate of change. 
Process p computes T, from its clock C, by the formula: 


Tp = apC, + 6, 


where a, and 6, are constants that are changed discontinuously—that is, 
they are piecewise constant functions of time. The value b, represents a 
zero-point correction;® the value a, represents a correction to the running 
rate of Cy. I will discuss the way a, changes; the change to 6, is determined 
by the requirement that TJ, be continuous and will be ignored. 

There are two corrections embodied in the value of ap: a correction to 
compensate for the measured inaccuracy of the local clock Cp, and a cor- 
rection to bring T, into synchrony with UT and with the times T, provided 
by other servers g. The component of a, that compensates for the inaccu- 
racy of C, effectively reduces the error p, in its rate. It can be obtained by 
comparing C, with J, for a long enough period of time. I will ignore this 
component and assume that any difference between a, and 1 is meant as a 
correction to achieve synchronization. Such a correction actually increases 
the error kp in the rate of change of T,. This increase is unavoidable. If 
clock synchronization is to be maintained, k, must be allowed to become 
larger than p,, the inherent error in the rate of p’s local clock. 

It is easiest to describe synchronization algorithms in terms of discon- 
tinuously resetting the time. There are a sequence of resynchronization 
times T, TO), T@), ... at which processes resynchronize. Every process 
p changes ay when T, = T\*), so the T) represent service times. For con- 
venience, I assume that all processes resynchronize at every time TO—a 
process p that does nothing at that time can be thought of as performing a 
resynchronization in which the new value of a, equals its old value. Time 
T() represents the service time at which the system is started. 

Resynchronization may actually be performed by having every process 
agree when their time T, should read T(), However, for this discussion, it is 
more convenient to have a process convert this into the inverse information: 
the time that T, should read when it actually reads T“). Assuming pp < 1, 
if p discovers that T, should read T() when it actually reads T() + AT, 
then it la that T, should read approximately T() — AT when it actually 
reads JT"), 


*If Cp is actually a cyclic timer instead of a monotonic clock, then b, is incremented 
every time C) is reset to zero. 
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At the i** resynchronization, process p thus learns what the “correct” 
service time should be when T, reads T), and uses this information to reset 
ay. (The resynchronization must be carried out in such a way that p receives 
this information before its clock reaches T*).) Here, the “correct” time is 
the one that is agreed upon by all processes as the one to which they want to 
resynchronize. The most convenient way to describe resynchronization is to 
have p maintain an “ideal” p-clock on that keeps the “correct” time learned 
during the i** resynchronization—in other words, cf) is the p-clock that has 
the “correct” time when Ty, equals T(). Hence, C) represents p’s knowledge, 
as of time T), of what the current service time should be. This knowledge 
becomes obsolete at service time T('+!), when the resynchronization provides 
p with more recent information about the “correct” time. Initially a, = 1 
and T, = aN so T, equals cf) from the time T() when the system is 
started until T(), 

To resynchronize T, at service time T\), p resets the rate of change of Tp 
so that, in the absence of any further resynchronization, T, would equal cf) 
after exactly J seconds, where J is some fixed constant. In other words, if p 
learned that, when T, reads T(), the service time should be T,+ AT, then p 
sets ap equal to 1+(AT/J). Thus, T, is always chasing the current “ideal” 
clock cf), It is convenient to assume that there is at least one resynchro- 
nization every J seconds. (We can always add a null resynchronization in 
which the new ideal clock etr) is the same as the old one co) ) Thus, 
if there is a nonzero correction during each resynchronization, then T, is 
always chasing the current ideal clock but never catches it. 

Finally, let us make one minor change to this algorithm. Instead of 
performing the i** resynchronization when Tp equals T(), we perform it 
when ce) equals T(*), This is a minor difference, since we expect T, and 
co) to be close together when either of them reads T(), However, as 
Proposition 5 below indicates, the different ideal clocks of) will be a little 
more closely synchronized to one another than the service times T,, so it is 
slightly better to use them to control the resynchronization. We can now 
restate our algorithm formally as follows, where J is a fixed parameter. 


Resynchronization Algorithm: Let T(, T@), ... be an unbounded 
increasing sequence of times with TU+) — T@) > J for all j, let CL, CM, 


... bea sequence of p-clocks; for i > 0, let i) be the Universal Time 
such that CEP) = T); and let t{°) be the Universal Time such that 
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CL) = TO, Then the service clock T, is defined for t > t by 


T,(t) = ap(t)Cp(t) + bp(t) 
where ay and 6, are defined as follows: 


e For £0) <t< ef): a,(t) = 1 and 6,(t) = TO — C(t). 


o For tf) < t < e849, § > 0: a(t) = 1+ (CH) — 7, (8))/J and 
by(t) = b(t) + (a(t) — ap(t8?))C,(t). 


What conditions are required of the ideal clocks cf) to guarantee that 
the T, defined by the Resynchronization Algorithm satisfy the correct-rate, 
synchronization, and correct-time conditions? Since each ci) is a p-clock, 


running at the same rate as C,, we know that the ideal clocks cl), 


(9) satisfy the correct-rate condition with bounds pp. We expect that they 
must also satisfy the synchronization and correct-time conditions. If the 
ideal clocks cf) satisfy these conditions, then the service clocks T, will too, 


provided that each T, remains close to the current ideal clock ci). (Of 
course, the actual bounds in these conditions will not be the same for the 
T, as for the CS").) 

Since T, is always “chasing” the current ch). we need a bound on how 
fast the ideal clocks can change as a result of resynchronization. The re- 
quired condition is that there exist a constant o, such that, during any time 
interval of length J, the total amount by which p’s ideal clocks are changed 
is less than op. (The constant J is the parameter of the Resynchronization 
Algorithm.) 


weeg 


bounded correction with constants o,: For all p and all j, k: if 7 <k 
and T(*) — TG) < J, then 


k-1 . 1 
Dolce? — Cpl < 0 


t=j 
The following result shows that the bounded-correction condition ensures 


that T, stays close to cf), The appearance of e (which equals 2.71828...) 
is somewhat surprising. 
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Proposition 3 [f the ci) satisfy the bounded-correction condition with con- 
stants o,, then the Resynchronization Algorithm ensures that for every Uni- 


versal Time t such that T® < CM(2) < TE+); 
(C(t) — Tp(t)| < eop/(e - 1) 


where terms of order o?/J are neglected. 


Proof: Define the time-dependent value C by C(t) = cot), for i deter- 
mined by the condition T() < Cf) < T+), In other words, C(t) is the 
time read at Universal Time ¢ by the ideal clock cl) being used at time 
t. Observe that C(t) advances at the same rate as C,(t) except that it is 
incremented by C{) — c{-") at the resynchronization (Universal) times t{}). 
The bounded-correction condition means that the sum of the absolute value 
of all such corrections performed during an interval of length J as measured 
by C is less than oy. For convenience, I assume that this condition holds 
when the length of the interval is measured by p’s local clock C, rather than 
by UT. This introduces an error in the length of the interval of at most op, 
which will introduce an error of order at most o /J in our bounds. 

Let a(t) equal |C(t) — T,(t)|. We must show that a(t) < eo,/(e — 1). 
A rigorous, straightforward proof is obtained by computing a(t + At) as a 
function of a(t) and the resynchronizations performed during the interval 
(t,t + At). Such a proof is tedious and unenlightening. Instead, a less 
rigorous but more intuitive proof is given. 

Since C, satisfies the correct-rate condition with bound less than one, 
we can approximate it arbitrarily closely on the interval [t,t + At] by a 
differentiable function C), with a strictly positive derivative. We can then 
replace all functions of Universal Time by their composition with Co . This 
substitution leads to the same formulas we would have if C, were a perfect 
clock, with C,(t) = ¢ for all ¢. (In other words, the substitution effects a 
change of coordinates from Universal Time to local-clock time.) Therefore, 
we may assume without loss of generality that C, is the identity function, 
so Cy is the Universal Time clock UT. 

Let t= to < ty < +--+ <t, =t+J, and assume that the only resynchro- 
nizations with nonzero corrections to C in the interval [¢,t + J] occur at the 
times t;. Let c; be the change to C at time t;, and let At; = t; —t;_1. Then 
De: At; = J and, by the bounded-correction hypothesis, 577, |ci| < op. 

In addition to the resynchronizations at time ¢;, we can have an arbitrary 
number of resynchronizations with zero correction. Such a resynchroniza- 
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tion has the effect of slowing the rate at which T, converges towards C. The 
maximum value for a is achieved by doing as many such resynchronizations 
as possible. The value of a obtained by any finite number of resynchro- 
nizations is less than the value obtained in the limiting case of continual 
resynchronization, in which da/dt = —a/J on each interval (t;_1,¢;). A bit 
of calculus then shows that 


a(t) < (a(ti-1) + ce 48/7 
from which we deduce 
a(t + J) < op + (a(t)/e) 


It is easy to show from this that if a(t) < eo,/(e — 1) then a(t + J) < 
eo,/(e — 1). To complete the proof of the proposition, we need only show 
that a(t) < ea,/(e — 1) holds for all ¢ in the initial interval (TO),7( + J]. 
However, this follows from the bounded-correction hypothesis and the fact 


that T; initially equals C0, so o(T) = 0. 8 


I leave it as an exercise for the reader to show that the bound of Propo- 
sition 3 is the best possible one. (Consider a scenario in which there is a 
resynchronization every J seconds that advances the ideal clock by almost 
o, and a large number of “zero resynchronizations”.) Proposition 3 imme- 
diately implies the following two results. 


Proposition 4 If the ci) satisfy the bounded-correction condition with con- 
stants Op, then the T, chosen by the algorithm above satisfy the correct-rate 
condition with bounds ppt+eop/(e—1)J (neglecting terms of order 02/J?). If, 
for each fized i, the ci) also satisfy the correct-time condition with bounds 
€p on the interval [T, T+), then the T, satisfy the correct-time condition 
with bounds €, + eo,/(e — 1) (neglecting terms of order o2/J). 


Proposition 5 If, for each fired i, the ci) satisfy the synchronization con- 
dition with bounds 5pq on the interval [T®,T“+], then the Ty chosen 
by the algorithm above satisfy the synchronization condition with bounds 
5pq + (2eop/(e — 1)) (neglecting terms of order o?/J). 


These results are based upon the assumption that T, initially equals Ce. 


Suppose this is not the case, so T, initially differs from the ideal clock c() 
by some quantity ATo. The argument used in the proof of Proposition 3 
shows that at time T) + J, T, will differ from its ideal clock by a quantity 
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AT, that is less than op + ATo/e, at time T) + 2J, T> will differ from its 
ideal clock by AT; < op + AT;/e, and so on. Thus, T, will keep getting 
closer to the ideal clock until it is within eo,/(e — 1). 

Similarly, the synchronization condition assumes that initially |T,—T>| < 
6pq. If this condition is not met, then the bound on |T, —T,| will keep getting 
smaller until it eventually reaches a value less than 5p, + (2eap/(e — 1)). 


4.2 Synchronization and Time-Correctness of Ideal Clocks 


Propositions 4 and 5 show that the J, satisfy the synchronization and 
correct-time conditions if the ideal clocks ci) satisfy these conditions dur- 
ing the interval [T“, T+] and the sequence of ideal clocks CO, cM, 
...Satisfies the bounded-correction condition. Moreover, suppose that the 
cl) satisfy the synchronization and correct-time conditions with bounds 6,, 
and €p just at service time T)—that is, on the interval [T), TM). In other 
words, suppose only that the ideal clocks are synchronized to within 6,, and 
lie within ¢, of Universal Time when they read T“), It is easy to see that 
the ci) must then satisfy the synchronization and correct-time conditions 
on the entire interval [T(), T¢+1)] with bounds 5p, + (Pp + pg (TUT) — TM) 
and €, + pp(TUt) — T)), respectively. 

The requirement that the cf) satisfy the correct-time condition places 
a bound on how much ch!) and cow) may differ. In particular, if ci) 
satisfies the correct-time condition on the entire interval [T@) ,T¢+)] with 
bound €, + p,(T(+)) — T), and Ct satisfies the correct-time condition 
at time T+) with bound e,, then [COTY — C| < 26, + po(TEH) — TM). 
These inequalities together with the propositions above easily imply the 
following result, where the hypothesis asserts that there is at least one and 
at most r resynchronizations performed every J seconds. 


Proposition 6 If, for each i, the clocks cl) satisfy the synchronization 
condition with bounds pq and the correct-time condition with bounds €, at 
time T(*), and there is at least one and at most r of the T() in any interval 
of the form [t,t +J), then the Resynchronization Algorithm guarantees that 
the clocks T, satisfy: 


e the correct-rate condition with bounds 
2re 


(1+ SS) ap + ee 
2-1)? (e-1)J” 


neglecting terms of order (re,/J)*? and p? 
P P. 
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e the synchronization condition with bounds 


4re 
-1 €p 


2e 
5pq + ((1+ 5) oto) J +2 


(neglecting terms of order (re,)?/J and p?J). 


e the correct-time condition with bounds 


2re e 
(4+ Zh )ot (1+ 5) 


(neglecting terms of order (re,)?/J and p?J). 


In order to compute J, and J,, processes p and q can use the values 
for Universal Time provided by different subsets of the Universal Time 
providers. Process p does not care which values are used by process q. 
However, the following result indicates that to achieve the synchronization 
condition with 6p, < €» + €q, it is necessary for p and q to agree to compute 


the values ci) and cf) using values of UT obtained from the same set of 
providers j. To apply this proposition in our case, let F and G be the func- 
tions used to compute T, and T, from the values UT“)(t) broadcast by the 
Universal Time providers. As I observed earlier, we expect these functions 
to be translation invariant. (Recall that ||U|| is the width of the interval U.) 


Proposition 7 Let F and G be translation-invariant functions on n-tuples 
of intervals such that for any intervals U,, ..., Un: if, for each j, ||U;|| < 
€ and the intersection of all the U; is nonempty, then |G(U1,...,Un) — 
F(U;,...,Un)| < 6. If § < €, then there is some j such that the values of 
both F(U,,...,U,) and G(U;,...,U,) depend upon the value of U;. 


Proof: We assume that there is no such j and show that e« < 6. This 
assumption implies that we can renumber the arguments so that, for 
some k, the value of F depends only upon its first k arguments and the 
value of G depends only upon its last n — k arguments. Let F’(U,V) = 
F(U,...,U,V,...,V) and G'(U,V) = G(U,...,U,V,...,V), with k copies 
of U and n—k copies of V. The value of F’ depends only on its first 
argument and the value of G’ depends only on its second argument. Let 
U be an interval of width ¢’, where & < €. Without loss of generality, 
we can assume that F’(U,U) > G'(U,U). The hypothesis implies that 
|G'(U + ¢,U) - F'(U + ¢,U)| < 6. Since the value of G’ does not depend 
upon its first argument, G’(U + ¢,U) = G’(U,U); similarly, F’(U + ¢,U) = 
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FU +¢,U +). Hence, |G’(U,U) — F'(U + ¢,U + )| < 6. However, the 
translation invariance of F’ implies that F'(U+¢,U+¢) = F'(U,U) +. 
Since F’(U,U) > G’(U,U), this allows us to conclude that é < 6. This is 
true for any e’ < €, which implies the desired result « < 6. & 


Thus, p and q must agree upon a set of Universal Time providers whose 
values they will use in computing cf) and ch), This set may change for 
different values of i (different resynchronizations). The method described 
in [3] can be employed to obtain agreement on the current set of Universal 
Time providers that are to be used. Here, let us assume that the values 
UT, 5, UR ea m. providers are used. 

To perform the it resynchronization, each process p obtains a set of 
intervals U; Q) a. oy ™) | where Us (3) is the value obtained from Universal 
Time povider 4. (lt is the current value of UT“) “smeared out” by un- 
certainties in message-transmission time.) Process p sets its p-clock ci) to 
equal F(U; ae Be ms (when of) = T")), where F is some real-valued 
function of m aitetvale 

What properties must F have? As we indicated above, we expect F to be 
translation invariant. We also require that F satisfy the Lipschitz condition 
for some pseudo-metric. Recall that the Lipschitz condition means that 
changing each argument by less than 6 changes the value of F by less than 
6. Translation invariance implies that moving each interval a distance of 
6 “in the same direction” changes the value of F' by 6, so the Lipschitz 
condition is the strongest “continuity bound” that can be achieved. 

The Lipschitz condition implies that we can satisfy the synchronization 
condition for the co) by ensuring that d( UY), uY )) < §pq for all 7. Intu- 
itively, the Lipschitz condition ensures that p and q will be closely synchro- 
nized if, for each Universal Time provider j, the values they obtain from 7 
are almost the same. 

Marzullo’s function M/, was defined so that M/(Uj,...,Um) is the 
largest interval whose endpoints lie within at least m — f of the intervals 
Ui, ..., Um. One might be tempted to define the function F by letting 
F(U1,...,Um) be the midpoint of Mf£,(U;,...,Um). However, this function 
does not satisfy a Lipschitz condition; in fact, it is not even continuous. 
Its discontinuity is illustrated in Figure 1, where m = 4, f = 1, Ip = 
Mi(UW,..., 0M), and = M1(UM,..., 7). In this example, the 
values of UZ G and Uj (9) are ones that could ‘be obtained if Universal Time 
provider 1 is faulty. Prdcssece pand q see the same values of U), U (3) and 
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(a) Process p’s computation. (b) Process q’s computation. 


Figure 1: Example of discontinuity of Marzullo’s function. 


u) , and they see values of U (1) that are almost the same. However, when 
they apply Marzullo’s function to these sets of intervals, they compute very 
different intervals J, and J,. Recall that Marzullo’s function computes the 
optimal value of J,—that is, the smallest interval that p knows to contain 
UT. It is this discontinuity in the optimal J, that makes it impossible, 
in the presence of malicious faults, to satisfy the synchronization condition 
with the extra requirement that T, lies within J,. More precisely, it can be 
shown that if a nonfaulty process may assume that it is nonfaulty, then the 
synchronization condition is incompatible with the requirement that each T, 
lies within J, when the 6, are smaller than half the widths of the intervals 
UT®), 

There are a number of functions F that are translation invariant and 
satisfy the Lipschitz condition for a suitable choice of pseudo-metric. Two 
such functions are obtained by letting F(U,,...,Um) equal the average or 
the median of the midpoints of the U;. (These functions satisfy the Lips- 
chitz condition for the midpoint pseudo-metric and therefore for the uniform 
metric.) A class of functions that includes both of these is defined as fol- 
lows. Let A/(U,,...,Um) be the average of the multiset of m — 2f numbers 
obtained by taking the midpoints of all the U; and omitting the f lowest 
and f highest of them. Each A/ (with m > 21) is translation invariant and 
satisfies the Lipschitz condition for the midpoint pseudo-metric. (This fol- 
lows from the result that, if the numbers z; and y;, with 1 < 7 < m, satisfy 
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|z; — y;| < 6, then for any s, the s‘* largest of the z; and the s‘* largest of 
the y; differ by at most 6.) 

The functions Af actually give us a somewhat stronger bound on dpq than 
that obtained simply from the Lipschitz condition. If d,(U ( iu G )) < 6(j), 
then using Af to compute cf) and oy gives an algorithm that satisfies 
the synchronization condition with bounds 6,, equal to the average of the 
m — 2f largest of the 6(j). If the worst-case difference 6(j) between the 
values that p and q obtain from Universal Time provider 7 depends upon 
j, then the Lipschitz condition guarantees only that 6p, is no larger than 
the maximum of the 6(j), while the averaging function A/ can do better. 
However, the different values of 6(j) will probably be almost the same in a 
practical application, so this is not significant. 

Next, we consider the correct-time condition: |UT — ch < €p. Suppose 
that at most f of the Universal Time providers may be faulty. If m < 2f, so 
at least half the Universal Time providers are faulty, there is not much hope 
of finding any algorithm that satisfies the correct-time condition, since all 
the faulty providers could give the same incorrect value.* Therefore, we can 
assume m > 2f. It is then easy to show that there exist nonfaulty providers 
j and j’ such that AS(US),..., ui”) lies between the midpoints of UY) 
and U Gy, Combining this with the result above for the synchronization 
condition, it is easy to prove the following result. 


Proposition 8 With the notation of the Resynchronization Algorithm, let 
U G ) be intervals such that, for all j: 

1. For all p and q: dm( UY), u')) < 5pq- 

2. For all p: if Universal Time provider j is nonfaulty, then ‘) €U G 
where each eo) is chosen so that CHA) = Af UO x3, um), If 


there are at most f faulty Universal Time Providers, then the co) sat- 
isfy the synchronization condition with bounds 5), (neglecting terms of order 


(Pp + Pq)5pq) and the correct-time condition with bounds max{|| u|1/2 ; 
provider j nonfaulty} at time T(). 


Observe that A/(U,,...,Um) depends only upon the midpoints of the 
U;, 80 AF(Uy,..., Um) = Af(U;£61,...; Umt&m) for any numbers €;. While 


*However, even with more than half the providers faulty, it is still possible to satisfy 
the synchronization condition. 
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the averaging function A/ gives reasonable worst-case behavior, it does not 
make the best use of the available information because it ignores the widths 
of intervals. Very wide intervals are given the same weight as narrow ones, 
even though they provide less information. One can construct examples in 
which the function A‘ does not provide the best possible approximation 
to UT. However, I know of no simple function F satisfying the Lipschitz 
condition that does better. 


4.3 Broadcasting Universal Time 


By Proposition 8, the synchronization and correct-time conditions for non- 
faulty processes can by met by broadcasting values from the Universal Time 
providers such that the following two conditions are satisfied, where d,, is 
the midpoint pseudo-metric on intervals and U G ) is the value obtained by 
p from server j during resynchronization i. 


1. If processes p and q are nonfaulty, then for every Universal Time 
provider 7: dy( ue), u®)) < pq. 


2. If process p and Universal Time provider j are both nonfaulty, then 
U T(t) lies in the interval U G3) 


These conditions are very similar to those of the approximate Byzantine 
agreement problem [1], in which each process p begins with a real value v, 
and must choose a real value v, such that: (i) for nonfaulty processes p 
and q: |v, — v,| < 6, and (ii) v, lies within the interval I determined by 
the largest and smallest of the values v,. For 6 < ||J||, the approximate 
Byzantine agreement problem is known to require f + 1 rounds of message 
passing to handle f failures, even for simple halting failures in a completely 
connected network. 

We can apply lower-bound results for the approximate Byzantine agree- 
ment problem to the problem of broadcasting a Universal Time provider’s 
value by letting v, be the midpoint of the value U (3) that process p obtains 
directly from provider 7. The broadcast problem then becomes a special case 
of the approximate Byzantine agreement problem. Since a faulty provider 
may send very different values to different processes, the result for the ap- 
proximate Byzantine agreement problem implies that f+1 rounds of message 
passing are needed to handle f process failures in a completely connected 
network. 
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We assume that if process p sends a message at time ¢ to process g over a 
path 7, and p, q, and 7 are nonfaulty, then the message is received at some 
time in the interval t+ [7,7;,,, 7ax]- However, what if one or more of the pro- 
cesses and/or channels on the path z are faulty? If we rule out “malicious” 
process behavior and garbled messages, the only type of failure possible is 
for a message sent over a channel c to take longer than 74,, seconds to 
be delivered. (A lost message is considered to take very much longer than 
Tmax seconds.) Even with malicious failures, one can guarantee that, with 
suitably high probability, a faulty process or channel can do no more than 
delay a message. Such a guarantee is achieved by using digital signatures, 
so a faulty process cannot falsify the information contained in a message, 
and by choosing the value of r<;,, so that it is physically impossible for a 
message to be sent over channel c in less than 7,,,, seconds—for example, by 
letting 72; = 0. In practice, how one achieves this guarantee depends upon 
the class of failure one is willing to tolerate. In most cases, it suffices to 
add simple redundancy to messages. However, tolerating malicious failures 
requires that a process relay a clock value by appending a digital signature 
to it without removing other process’s signatures [2,4]. 

The following algorithm by which a Universal Time provider j broadcasts 
a set of clocks to all processes rests upon the assumption that faults can only 
delay (or lose) messages. However, if j is faulty, it may send different values 
to different processes. The choice of the constant k is discussed later. 


Byzantine Clock-Broadcast Algorithm: A Universal Time provider j 
broadcasts a p-clock to every process p as follows. (The sets C, of p-clocks 
are initially empty.) 


1. j sends an interval U) to all its neighbors. 


2. If process p receives the interval R along path 7 at time ft, then it adds 
to Cy the p-clock JY whose value at time t equals R”, and it relays R to 
each of its neighbors q unless one or more of the following conditions 


holds: 
e q is on the path x. 


e C, already contained p-clocks U and V such that the left endpoint 
of U is greater than or equal to the left endpoint of R5 and the 
right endpoint of V is greater than or equal to the right endpoint 
of Ip. 

e The length of 7 equals k. 
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3. When no more messages can arrive, process p sets U TY) to be the 
p-clock whose left and right endpoints are the maxima of the left and 
right endpoints of all the p-clocks I>. 


Note that in the second condition of step 2, U and V could be the same 
p-clock. This condition can be strengthened so that p need not relay R to 
q if it knows that q has already added to C, a q-clock approximately equal 
to If. For example, suppose p received R by an Ethernet message and q 
is on the same Ethernet. If one is willing to assume benign failure modes 
for the Ethernet, then p could assume that q received the same message at 
approximately the same time, so there is no need for p to relay it. However, 
the resulting algorithm would then tolerate only benign Ethernet faults. 

Define the delay r and the variance 7 of a Byzantine Clock-Broadcast 
Algorithm to be the maximum of 77,, and 7” for all paths 7 from j of 
length at most k. Suppose the clocks of any two nonfaulty processes differ 
by at most ¢. Since messages along faulty paths can be ignored, it is easy 
to see that if a Byzantine clock broadcast is initiated by provider j when its 
clock equals T, then a process can ignore any message that reaches it over 
a path 7 of length | when its clock reads later than 77,, +/¢. Hence, each 
process p can compute its p-clock U TY ) at time r+ k¢, where r is the delay 
of the algorithm. 

Suppose that, in executing the Byzantine Clock-Broadcast Algorithm, p 
receives R along path x. Let r be a node on this path such that ¢ is the 
subpath of m going from j to r, and 7 is the subpath going from r to p, so p 
received R because r relayed R along 7. Suppose that r also relays R to q 
along . If p, q, r, ~, and 7 are nonfaulty, then Proposition 2 implies that 
if p received R at time t, then 


dy (Ip (t),19¥(t)) < 77 +9¥ 


(neglecting terms such as ppT,7,,). The following result follows easily from 
this. 


Proposition 9 Assume that for every pair of nonfaulty processes p, q there 
are paths dn from Universal Time provider j to p and ¢w from j to q of 
length at most k such that n and are nonfaulty. If a Byzantine Clock- 
Broadcast algorithm with variance y is started at time t to broadcast an 
interval U®), then: 


1. If p and q are nonfaulty, then d.( UTY)(2), UT)(t)) < 27 
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2. If j and p are nonfaulty and UT(t) € U) then UT(t) € uTY)(t). 


(neglecting terms of order pyt7,, for paths x of length at most k from j to 


p). 


For any particular network, the hypothesis of Proposition 9 can be satisfied 
by making k large enough if every pair of nonfaulty processes are connected 
by some nonfaulty path. The choice of k and the actual set of channels to 
use for the broadcast will be a compromise between the conflicting desires 
to increase reliability and reduce the number of messages sent. 

The conclusion of Proposition 9 is almost but not quite in the form neces- 
sary for implying the hypothesis of Proposition 8. tae is because the ee 
Us (9) used in the actual algorithm will be U TY x) ) rather than U TY ) (t). 
However the conclusion of Proposition 9 remains valid after replacing t by 
the values i if we can neglect terms of order pp its?) ~— t|. These terms will 
be negligibly small if the time t, when the clock broadcast is begun, is close 
to the resynchronization time 100), The broadcast needs to be begun early 
enough so that every process p receives its value before time 1), ‘) | which is 
the time when its clock cf ) reads T(). If the oe) satisfy the consetcuints 
condition with bounds ¢, and 7 is the maximum of 71,, for all paths from 7 
of length k, then we get a minimum value for as —t| on the order of €p +T. 


4.4 The Complete Algorithm 


We now have all the pieces necessary to construct an algorithm to compute 
the T,. First, one must select disjoint sets P; of processes such that if p and 
q want to synchronize their times so that 6, < €p + €q, then they both lie 
within the same set P,. If the sets P; can change, then the algorithm of 
[3] is used to guarantee that all nonfaulty processes agree upon the current 
collection of sets. 

A process p not in any set P; simply chooses T, to be the midpoint of 
I,. All processes p in the same set P; choose their service times T, by the 
following algorithm. 


Service Time Algorithm: 
1. The processes in P; choose a set of Universal Time providers. The 


method of [3] is used to ensure that all processes in P; agree upon 
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a set of providers that are thought to be nonfaulty and to provide 
suitable values.> Let this set of providers be numbered from 1 to m. 


2. For a sequence of predetermined times T*) and time providers j;, when 
the maximum (right-hand endpoint) of UT) equals T() — r — k(x + 
pi), provider j; executes the Byzantine Clock-Broadcast Algorithm, 
with U (3) equal to the current value of UT), to broadcast a p-clock 
U Th) to every process p in P;, where 


e 7 is the delay of the broadcast algorithm. 

ex>]| UTI j2 for all p. 

e #H is a constant such that each provider j broadcasts its value of 
UT) in this way at least once every H seconds. 


e kis the parameter of the broadcast algorithm. 
3. Each process p sets Gy equal to the p-clock Af(U 7). ia iy UF Ty, 


4. Process p uses the Resynchronization Algorithm to compute T,. 


It follows from the correct-time condition in Proposition 10 below that 
provider j; initiates its broadcast early enough so each process p can compute 
cf) by the time cen reaches T'), 

Propositions 6, 8, and 9 allow us to deduce that the T, satisfy the correct- 
rate, synchronization, and correct-time conditions. However, the bounds in 
these conditions become rather complex. Therefore, only the simpler condi- 
tions for the ideal clocks ch) are given; the corresponding conditions for the 
service clocks T, are obtained from these bounds by applying Proposition 6. 


Proposition 10 If at most f of the Universal Time providers are faulty, 
then the clocks cf) constructed in step 3 of the Service Time Algorithm 
satisfy 

e the clock-synchronization condition with bounds 27 + (pp + pq) H. 

e the correct-time condition with bounds x + ppH. 
on the interval [T), T¢+1)] (neglecting terms of order ppt). 

5Since agreement takes time, a provider will have to remain in the set of chosen 
providers for some period of time after it is discovered to be faulty before the processes 


agree to eliminate it. Thus, even if we assumed that faulty Universal Time providers can 
be detected, the algorithm for choosing Ty, still has to tolerate faulty providers. 
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Glossary 
Cy: Process p’s local clock. 


ch); A mythical ideal clock that runs at the same rate as C, and maintains 
the “correct” time, as learned by p in the i*® resynchronization. 


d,,: The midpoint pseudo-metric on (bounded) intervals; d,,(U,V) is de- 
fined to be the distance between the midpoints of U and V. 


d,: The uniform metric on intervals; d,([r,y],[v,w]) is defined to be the 
maximum of |v — z| and |w — y|. 


e: 2.71828182845904523536028747 13526624977572470936999595 749669. . . 


H: A parameter of the Service Time Algorithm, chosen to be a length of 
time such that each Universal Time provider broadcasts its value at 
least once every H seconds. 


a: Used as a superscript to denote a clock-synchronization event. 


I,: A clock range, maintained by time server p, that is guaranteed to contain 
UT. 


j: Used as a sub- or superscript to denote a Universal Time provider. 


J: A parameter of the Resynchronization Algorithm, chosen to be a length 
of time such that there is at least one resynchronization every J sec- 
onds. (At the i** resynchronization, process p sets the running rate of 


Tp so that T, would equal cf) after exactly J seconds, in the absence 
of further resynchronization.) 


k: The maximum-length path by which messages travel in a Byzantine 
Clock-Broadcast Algorithm. 


Lipschitz condition: F satisfies a Lipschitz condition for pseudo-metric d 
if d(U;, Vi) < 6 for all i implies |F(U,,...,Um) — F(Vi,.-..,Vm)| < 6. 


pseudo-metric: A nonnegative function d such that: (i) d(U,V) = d(V,U), 
(ii) d(U, V) + d(V,W) > d(U,W), and (iii) d(U,U + €) = |e|. 


R: The interval R + (77;,,, Tax], where R is an interval and 7 is a path. 


T,: The time provided by time-service process p. 
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T: The time of the i® resynchronization (as read by the clocks ci), 


translation invariance: F is translation invariant if F(U; + 2,...,Um+ 
#) = F(0y,.7;Um) + 2. 


UT: Universal Time—the ideal standard, closely approximated by clocks 
at the National Bureau of Standards and other places throughout the 
world. 


U); An interval containing UT broadcast by Universal Time provider j 
during a particular synchronization. 


U G ): The interval obtained by process p when U (3) is broadcast by Univer- 
sal Time provider 7. 


UT): A clock range maintained by Universal Time provider j that contains 
UT. 


U To) : The p-clock obtained by process p when Universal Time provider j 
broadcasts UT), 


1); The value of UT at which process p’s ideal clock a ae, reads T), 
y: The variance of a Byzantine Clock-Broadcast Algorithm. 


7°: When c is a channel, it equals 7S, — 75;,, the uncertainty in message- 
transmission time over channel c. For a path 7, 7” is the sum of the 
7° for all channels c in the path. 


5pg: An upper bound on the difference between time values provided by 
nodes p and g—e.g., an upper bound for |T, — T,]. 


e: An upper bound on half the width of U T) for all nonfaulty Universal 
Time providers 7. 


€p: An upper bound on the difference between Universal Time and a value 
provided by process p—e.g., an upper bound on | UT — T,|. 


Kp: An upper bound on the error in the rate of change of the service time 
T, provided by process p. 


Pp: An upper bound on the error in the running rate of Cp. 
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Op: The maximum amount by which resynchronization can change p’s ideal 
clocks cf) during any time interval of J seconds duration. 


tT: The delay of a Byzantine Clock-Broadcast Algorithm. 


Teint Lhe minimum message delay for a message sent across channel c. (It 
includes the time needed to generate the message.) For a path 7, 77, 
is the sum of the minimum message delays for all channels in path 7. 


Toax: The maximum message delay for a message sent across channel c 
(including the time needed to generate the message). For a path 7, 
Trax is the sum of the maximum message delays for all channels in 
path x. 


x: A parameter of the Service Time Algorithm, at least half the maximum 
width of U TY ) for every nonfaulty Universal Time provider j and 
nonfaulty process p. 


||... || : The width of an interval, defined by ||[z,y]]] = y — 2. 


+: The sum of two intervals is defined by [u, v] +[z,y] = [ut+tz,v+y]. The 
sum of an interval and a number is defined by [u,v] +2 = [ut+z,v+a]. 


+: For an interval [u,v] and a number 6 > 0, [u,v] + 6 is the interval 
[u— 6,v + 6]. 
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